26 research outputs found

    Enhanced Productivity Using the Cray Performance Analysis Toolset

    Get PDF
    Abstract The purpose of an application performance analysis tool is to help the user identify whether or not their application is running efficiently on the computing resources available. However, the scale of current and future high end systems, as well as increasing system software and architecture complexity, brings a new set of challenges to todays performance tools. In order to achieve high performance on these peta-scale computing systems, users need a new infrastructure for performance analysis that can handle the challenges associated with multiple levels of parallelism, hundreds of thousands of computing elements, and novel programming paradigms that result in the collection of massive sets of performance data. In this paper we present the Cray Performance Analysis Toolset, which is set on an evolutionary path to address the application performance analysis challenges associated with these massive computing systems by highlighting relevant data and by bringing Cray optimization knowledge to a wider set of users

    Supporting Relative Debugging for Large-scale UPC Programs

    Get PDF
    AbstractRelative debugging is a useful technique for locating errors that emerge from porting existing code to new programming language or to new computing platform. Recent attention on the UPC programming language has resulted in a number of conventional parallel programs, for example MPI programs, being ported to UPC. This paper gives an overview on the data distribution concepts used in UPC and establishes the challenges in supporting relative debugging technique for UPC programs that run on large supercomputers. The proposed solution is implemented on an existing parallel relative debugger CCDB, and the performance is evaluated on a Cray XE6 system with 16,348 cores

    An Implementation of the POMP Performance Monitoring Interface for OpenMP Based on Dynamic Probes

    No full text
    Abstract. OpenMP has emerged as the standard for shared memory parallel programming. Unfortunately, it does not provide a standardized performance monitoring interface, such that users and tools builders could write portable libraries for performance measurement of OpenMP programs. In this paper we present an implementation of a performance monitoring interface for OpenMP, based on the POMP proposal, which is built on top of DPCL, an infrastructure for binary and dynamic instrumentation. We also present overhead measurements of our implementation and show examples of utilization with two versions of POMP compliant libraries.

    Data centric highly parallel debugging

    No full text
    Debugging parallel programs is an order of magnitudemore complex than sequential ones, and yet, most parallel debuggers provide little extra functionality than their sequential counterparts. This problem becomes more serious as computational codes become more complex, involving larger data structures, and as the machines become larger. Peta-scale machines consisting of millions of cores pose a significant challenge for existing techniques. We argue that debugging must become more data-centric, and believe that "assertions" provide a useful model. Assertions allow a user to declare their expectations about the program state as a whole rather than focusing on that of only a single process state. Previously, we have implemented a special type of assertion that supports debugging applications as they evolve or are ported to different platforms. They allow a user to compare the state of one program against another reference version. These 'relative debugging' assertions, whilst powerful, pose significant implementation challenges for large peta-scale machines. In this paper we discuss a hashing technique that provides a scalable solution for very large problems on very large machines. We illustrate the scheme on 65k cores of Kraken, a Cray XT5 at the University of Tennessee. Copyright 2010 ACM

    Compile-time Based Performance Prediction

    No full text
    In this paper we present results we obtained using a compiler to predict performance of scientific codes. The compiler, Polaris [3], is both the primary tool for estimating the performance of a range of codes, and the beneficiary of the results obtained from predicting the program behavior at compile time. We show that a simple compile-time model, augmented with profiling data obtained using very light instrumentation, can be accurate within 20% (on average) of the measured performance for codes using both dense and sparse computational methods

    Assertion Based Parallel Debugging

    No full text
    Programming languages have advanced tremendously over the years, but program debuggers have hardly changed. Sequential debuggers do little more than allow a user to control the flow of a program and examine its state. Parallel ones support the same operations on multiple processes, which are adequate with a small number of processors, but become unwieldy and ineffective on very large machines. Typical scientific codes have enormous multi-dimensional data structures and it is impractical to expect a user to view the data using traditional display techniques. In this paper we discuss the use of debug-time assertions, and show that these can be used to debug parallel programs. The techniques reduce the debugging complexity because they reason about the state of large arrays without requiring the user to know the expected value of every element. Assertions can be expensive to evaluate, but their performance can be improved by running them in parallel. We demonstrate the system with a case study finding errors in a parallel version of the Shallow Water Equations, and evaluate the performance of the tool on a 4,096 cores Cray XE6

    A scalable parallel debugging library with pluggable communication protocols

    No full text
    Parallel debugging faces challenges in both scalability and efficiency. A number of advanced methods have been invented to improve the efficiency of parallel debugging. As the scale of system increases, these methods highly rely on a scalable communication protocol in order to be utilized in large-scale distributed environments. This paper describes a debugging middleware that provides fundamental debugging functions supporting multiple communication protocols. Its pluggable architecture allows users to select proper communication protocols as plug-ins for debugging on different platforms. It aims to be utilized by various advanced debugging technologies across different computing platforms. The performance of this debugging middleware is examined on a Cray XE Supercomputer with 21,760 CPU cores
    corecore